9 research outputs found

    Unsupervised learning of relation detection patterns

    Get PDF
    L'extracci贸 d'informaci贸 茅s l'脿rea del processament de llenguatge natural l'objectiu de la qual 茅s l'obtenir dades estructurades a partir de la informaci贸 rellevant continguda en fragments textuals. L'extracci贸 d'informaci贸 requereix una quantitat considerable de coneixement ling眉铆stic. La especificitat d'aquest coneixement suposa un inconvenient de cara a la portabilitat dels sistemes, ja que un canvi d'idioma, domini o estil t茅 un cost en termes d'esfor莽 hum脿. Durant d猫cades, s'han aplicat t猫cniques d'aprenentatge autom脿tic per tal de superar aquest coll d'ampolla de portabilitat, reduint progressivament la supervisi贸 humana involucrada. Tanmateix, a mida que augmenta la disponibilitat de grans col路leccions de documents, esdevenen necess脿ries aproximacions completament nosupervisades per tal d'explotar el coneixement que hi ha en elles. La proposta d'aquesta tesi 茅s la d'incorporar t猫cniques de clustering a l'adquisici贸 de patrons per a extracci贸 d'informaci贸, per tal de reduir encara m茅s els elements de supervisi贸 involucrats en el proc茅s En particular, el treball se centra en el problema de la detecci贸 de relacions. L'assoliment d'aquest objectiu final ha requerit, en primer lloc, el considerar les diferents estrat猫gies en qu猫 aquesta combinaci贸 es podia dur a terme; en segon lloc, el desenvolupar o adaptar algorismes de clustering adequats a les nostres necessitats; i en tercer lloc, el disseny de procediments d'adquisici贸 de patrons que incorporessin la informaci贸 de clustering. Al final d'aquesta tesi, hav铆em estat capa莽os de desenvolupar i implementar una aproximaci贸 per a l'aprenentatge de patrons per a detecci贸 de relacions que, utilitzant t猫cniques de clustering i un m铆nim de supervisi贸 humana, 茅s competitiu i fins i tot supera altres aproximacions comparables en l'estat de l'art.Information extraction is the natural language processing area whose goal is to obtain structured data from the relevant information contained in textual fragments. Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort. Machine learning techniques have been applied for decades so as to overcome this portability bottleneck驴progressively reducing the amount of involved human supervision. However, as the availability of large document collections increases, completely unsupervised approaches become necessary in order to mine the knowledge contained in them. The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third, devising pattern learning procedures which incorporated clustering information. By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable approaches in the state of the art.Postprint (published version

    Una aproximaci贸 d'aprenentatge autom脿tic per a extracci贸 d'informaci贸 adaptativa

    No full text
    Les t猫cniques de clustering poden ajudar a reduir la supervisi贸 en processos d'obtenci贸 de patrons per a Extracci贸 d'Informaci贸. En aquest treball, que abarca un per铆ode de 4 anys de recerca, es comen莽a per estudiar la representaci贸 de documents m茅s adequada per a la tasca de clustering. Per tal d'evitar els biaixos dels m猫todes individuals de clustering, es consideren m猫todes de clustering conjunt. S'exploren diversos m猫todes de combinaci贸 supervisada, i s'hi afegeixen estrat猫gies autom脿tiques per a determinar el nombre de clusters de la combinaci贸. Tamb茅 es consideren mecanismes per a obtenir clusterings conjunts ponderats, aix铆 com estrat猫gies de combinaci贸 no supervisada. Finalment, els resultats del clustering s'utilitzen en un sistema d'adquisici贸 de patrons per a substituir els elements de supervisi贸 humana. Totes aquestes estrat猫gies i m猫todes s'avaluen en tasques de clustering de documents i adquisici贸 de patrons usant dades reals. Es comprova que els mots com representaci贸 de documents superen altres models per a la tasca de clustering, aix铆 com que el clustering conjunt supera les limitacions dels clusterings individuals, i que les estrat猫gies no supervisades d'adquisici贸 de patrons obtenen resultats competitius respecte a les estrat猫gies supervisades

    Kernels sem脿ntics per a clustering de patrons

    No full text
    Mem貌ria elaborada a partir d鈥檜na estada al projecte Proteus de la New York University entre abril i juny del 2007. Les t猫cniques de clustering poden ajudar a reduir la supervisi贸 en processos d鈥檕btenci贸 de patrons per a Extracci贸 d鈥橧nformaci贸. Tanmateix, 茅s necessari disposar d鈥檃lgorismes adequats a documents, i aquests algorismes requereixen mesures adequades de similitud entre patrons. Els kernels poden oferir una soluci贸 a aquests problemes, per貌 l鈥檃prenentatge no supervisat requereix d鈥檈strat`egies m麓es astutes que l鈥檃prenentatge supervisat per a incorporar major quantitat d鈥檌nformaci贸. En aquesta mem貌ria, fruit de la meva estada de mes d鈥橝bril al de Juny de 2007 al projecte. Proteus de la New York University, es proposen i avaluen diversos kernels sobre patrons. Ini- cialment s鈥檈studien kernels amb una fam铆lia de patrons restringits, i a continuaci贸 s鈥檃pliquen kernels ja usats en tasques supervisades d鈥橢xtracci贸 d鈥橧nformaci贸. Degut a la degradaci贸 del rendiment que experimenta el clustering a l鈥檃fegir informaci贸 irrellevant, els kernels se simpli- fiquen i es busquen estrat猫gies per a incorporar-hi sem脿ntica de forma selectiva. Finalment, s鈥檈studia quin efecte t茅 aplicar clustering sobre el coneixement sem脿ntic com a pas previ al clustering de patrons. Les diverses estrat猫gies s鈥檃valuen en tasques de clustering de documents i patrons usant dades reals

    Clustering no param茅trico de documentos mediante m茅todos de consenso

    No full text
    Los sesgos de los algoritmos individuales para clustering no param茅trico de documentos pueden conducir a soluciones no 贸ptimas. Los m茅todos de consenso podr铆an compensar esta limitaci贸n, pero no han sido probados sobre colecciones de documentos. Este art铆culo presenta una comparaci贸n de estrategias para clustering no param茅trico de documentos por consenso.The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may over-come this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.This work has been partially funded by the European CHIL Project (IP-506909); the Commissionate for Universities and Research of the Department of Innovation, Universities and Enterprises of the Catalan Government; and the European Social Fund

    Unsupervised ensemble minority clustering

    No full text
    Cluster analysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise. The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms. In this work, we propose a novel ensemble minority clustering algorithm, EWOCS, suitable for weak clustering combination. Its properties have been theoretically proved under a loose set of constraints. We also propose a number of weak clustering algorithms, and an unsupervised procedure to determine the scaling parameters for Gaussian kernels used within the task. We have implemented a number of approaches built from the proposed components, and evaluated them on a collection of datasets.Peer Reviewe

    Unsupervised ensemble minority clustering

    No full text
    Cluster analysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise. The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms. In this work, we propose a novel ensemble minority clustering algorithm, EWOCS, suitable for weak clustering combination. Its properties have been theoretically proved under a loose set of constraints. We also propose a number of weak clustering algorithms, and an unsupervised procedure to determine the scaling parameters for Gaussian kernels used within the task. We have implemented a number of approaches built from the proposed components, and evaluated them on a collection of datasets.Peer Reviewe

    ParTes: Test suite para evaluaci贸n de analizadores sint谩cticos

    No full text
    This paper presents ParTes, the first test suite in Spanish and Catalan for parsing qualitative evaluation. This resource is a hierarchical test suite of the representative syntactic structure and argument order phenomena. ParTes proposes a simplification of the qualitative evaluation by contributing to the automatization of this task.En este art铆culo se presenta ParTes, el primer test suite en espa帽ol y catal谩n para la evaluaci贸n cualitativa de analizadores sint谩cticos autom谩ticos. Este recurso es una jerarqu铆a de los fen贸menos representativos acerca de la estructura sint谩ctica y el orden de argumentos. ParTes propone una simplificaci贸n de la evaluaci贸n cualitativa contribuyendo a la automatizaci贸n de esta tarea.The resource presented in this paper arises from the research project SKATeR (Ministry of Economy and Competitiveness, TIN2012-38584-C06-06 and TIN2012-38584-C06-01)
    corecore